Fail-Aware Failure Detectors
نویسندگان
چکیده
In existing asynchronous distributed systems it is impossible to implement failure detectors which are perfect, i.e. they only suspect crashed processes and eventually suspect all crashed processes. Some recent research has however proposed that any “reasonable” failure detector for solving the election problem must be perfect. We address this problem by introducing two new classes of fail-aware failure detectors that are 1) implementable in existing asynchronous distributed systems, 2) not necessarily perfect, and 3) can be used to solve the election problem. In particular, we show that there exists a fail-aware failure detector that allows to solve the election problem and which is strictly weaker than a Perfect failure detector.
منابع مشابه
Computer Science and Artificial Intelligence Laboratory Impossibility of Boosting Distributed Service Resilience
We prove two theorems saying that no distributed system in which processes coordinate using reliable registers and f -resilient services can solve the consensus problem in the presence of f + 1 undetectable process stopping failures. (A service is f -resilient if it is guaranteed to operate as long as no more than f of the processes connected to it fail.) Our first theorem assumes that the give...
متن کاملSurvey on Scalable Failure Detectors
Maintaining a timely view of the current system status is essential to the performance and functionality of distributed systems. Failure detectors have long been essential to distributed systems. In this paper, we evaluate two failure detection algorithms specifically aimed at large-scale systems. Both assume fail-stop (non-Byzantine) models but the similarities end there. Dynamo’s failure dete...
متن کاملDerivation of Fail-Aware Membership Service Specifications
We derive the speci cation of a primary partition and a partitionable fail-aware node membership service in a top-down fashion. The derived speci cations are fail-aware in the sense that each client of a membership server can learn if the server currently provides its standard semantics or an exception semantics because too many failures have occurred. We rst propose the speci cation of an idea...
متن کاملFailure Detectors for Large-Scale Distributed Systems
This paper discusses the problem of implementing a scalable failure detection service for Grid systems. More specifically, traditional implementations of failure detectors are often tuned for running over local networks and fail to address some important problems found in wide-area distributed systems, such as Grid systems. We identify some of the most important problems raised in the context o...
متن کاملIRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES Fail-Aware Untrusted Storage
We consider a set of clients collaborating through an online service provider that is subject to at-tacks, and hence not fully trusted by the clients. We introduce the abstraction of a fail-aware un-trusted service, with meaningful semantics even when the provider is faulty. In the common case,when the provider is correct, such a service guarantees consistency (linearizability) and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996